Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOTFIX] fix fwd trans conv MakeFwdCtxAndProblem() intenal API #2476

Closed
wants to merge 5 commits into from

Conversation

carlushuang
Copy link
Contributor

This is to fix issue : #2459
The root cause is not ValidateGroupCount, but some discrepancy in different conv APIs. Here we modify MakeFwdCtxAndProblem() API to match the conv fwd transposed requirement.

A quick review of transposed_conv :

in fwd we have N*C*Hi*Wi @ K*C*Y*X -> N*K*Ho*Wo. If this is a transposed conv, then it's actually is a backward conv, the layout of weight tensor need change to C*K*Y*X (swap the C and K).

According to pytorch standard https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html
the layout for weight for tansposed_conv is input_ch * output_ch * Y * X, this indicates that for transposed fwd conv, the only requirement is to swap weight C and K.

This is luckily done by the MIOpenDriver conv_driver.hpp Line : 1001~1011

if(mode == miopenTranspose)
{
    wei_lens[0] = wei_c_len;
    wei_lens[1] = wei_k_len / group_count;
}
else
{
    wei_lens[0] = wei_k_len;
    wei_lens[1] = wei_c_len / group_count;
}

The only requirement is when detecting transposed conv in conv-fwd, we need to set the proper argument to direct it to conv-bwd ( assuming the C/K swap is done by user or by MIOpenDriver )

This PR fix the bug for MakeFwdCtxAndProblem(), we do not need to swap the xDesc/yDesc if detecting this is a transposed, because if this is actually transposed-conv, we only need to set the direction to backward, then it's OK. This is also aligned with src/convolution_api.cpp miopenConvolutionForward(), the xDesc/yDesc do not need to swap
image

@carlushuang
Copy link
Contributor Author

However this PR is blocked by issue: #2474. I have a quick hotfix for this issue in PR:#2477

@atamazov
Copy link
Contributor

@carlushuang It's sad to say that, but this PR removes implementation of forward transposed convolution and replaces it by non-transposed one and therefore should not be merged ;) Please see e.g. PR 1804 in the private MIOpen repo for more info.

layout for weight for tansposed_conv is input_ch * output_ch * Y * X

That's correct. When we use output tensor (NKHW) as input (NC'HW), and use input (NCHW) as output (NK'HW), the layout of weights KCYX automatically considered as C'K'YX so we do not need to do anything (except applying Bwd conv instead of Fwd etc).

@carlushuang
Copy link
Contributor Author

Our implementation in code like MIOpenConvolutionForward (see the picture in the description of this PR) will not exchange x/y, while in the function MakeFwdCtxAndProblem() I changed in this PR changed the place, this is why I say there is a discrepancy. If we keep the behavior in this MakeFwdCtxAndProblem() function, I think we have to modify the transposed logic in other places, which I think is quite a lot

@JehandadKhan
Copy link
Collaborator

layout for weight for tansposed_conv is input_ch * output_ch * Y * X

That's correct. When we use output tensor (NKHW) as input (NC'HW), and use input (NCHW) as output (NK'HW), the layout of weights KCYX automatically considered as C'K'YX so we do not need to do anything (except applying Bwd conv instead of Fwd etc).

@atamazov I understand what you are saying, however, that does not align with the following change in MIOpenDriver conv_driver.hpp Line : 1001~1011

if(mode == miopenTranspose)
{
    wei_lens[0] = wei_c_len;
    wei_lens[1] = wei_k_len / group_count;
}
else
{
    wei_lens[0] = wei_k_len;
    wei_lens[1] = wei_c_len / group_count;
}

Since then the above transformation would happen inside MIOpen automatically as you mentioned.

@atamazov
Copy link
Contributor

@JehandadKhan

I understand what you are saying, however, that does not align with the following change in MIOpenDriver conv_driver.hpp Line : 1001~1011

Looks like I should give a more detailed explanation.

Generally, the implementation of something in the driver has nothing to do with the implementation in the library. The general requirement is that the driver must produce valid inputs, validate outputs correctly, and make valid API calls to the library, while the library must correctly interpret inputs and write valid outputs.

The requirement may be interpreted as an "alignment" between the library and the driver but I do not think this is really so.

How the driver produces valid layouts for transposed convolutions

The driver does not swap input and output tensors, but swaps K and C in the layout of weights instead (as you are correctly mentioning). The result is that tensor layouts comply with PyTorch requirements for the transposed convolution: NCHW in, NKHW out, CKYX weights. This is good, but has no effect on what happens in the library. The library must handle tensor layouts correctly by itself.

How the library handles the tensor layouts of transposed convolutions

As we know, the library gets NCHW in, CKYX weights and NKHW out. The idea is to swap input and output and use Forward instead of Backward (and vice versa). So what happens:

  • NCHW input tensor is interpreted as output, and therefore its layout is interpreted as NK'HW (K'=C)
  • K=C': NKHW output tensor is interpreted as input, and therefore its layout is interpreted as NC'HW (C'=K)
  • Consequently CKYX weights is interpreted as K'C'YX weights

As you can see, after swapping input and output tensors, we have normal NCHW layout internally and therefore we can leverage existing normal convolutions (but in opposite directions) to implement the transposed ones.

/cc @junliume @carlushuang

@atamazov
Copy link
Contributor

@carlushuang

Our implementation in code like MIOpenConvolutionForward (see the picture in the description of this PR) will not exchange x/y

x and y are swapped.

Please compare normal Backward call:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/b5c9cd5b0fa65bc77004dd59adcbb336ead031af/src/convolution_api.cpp#L1162-L1173
with Transposed Forward:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/b5c9cd5b0fa65bc77004dd59adcbb336ead031af/src/convolution_api.cpp#L569-L580

@atamazov
Copy link
Contributor

I am getting bogged down in discussions, but still keep the hope that is would help someone.

@carlushuang
Copy link
Contributor Author

@carlushuang

Our implementation in code like MIOpenConvolutionForward (see the picture in the description of this PR) will not exchange x/y

x and y are swapped.

The x/y swap I mean is between forward, and forward+trans(which calls into bwd code), these 2 parts are swapped or not. So actually it is not. The code change I did in this PR is also keep the same behavior, that not swap x/y in fwd and fwd+trans case.

@@ -43,6 +43,7 @@
MIOPEN_DECLARE_ENV_VAR(MIOPEN_CONV_PRECISE_ROCBLAS_TIMING)

#define WORKAROUND_MIOPENGEMM_ISSUE_59 1
#define WORKAROUND_MIOPENGEMM_ISSUE_2474 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlushuang It's likely unnecessary since in the release dockers we are not seeing these issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should not push that commit related to bwd-gemm on my local machine. Now that commit is reverted

@carlushuang
Copy link
Contributor Author

@junliume thanks, I revert that commit related to bwd-gemm now

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO GO sorry.

Comment on lines +59 to +60
auto problem = ProblemDescription{
miopen::deref(xDesc), miopen::deref(wDesc), miopen::deref(yDesc), conv, direction};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atamazov Is there any alternative we can compare as a quick patch fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junliume Fix is done, in local testing. ETA for PR is tomorrow morning

@JehandadKhan
Copy link
Collaborator

NO GO sorry.

@atamazov Sounds good, lets wait for your fix!

@junliume
Copy link
Collaborator

replaced by #2487 and agreed by author. Keep this page for valuable information.

@junliume junliume closed this Oct 27, 2023
@junliume junliume deleted the fix_fwd_trans_get_workspace branch February 6, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants